perm filename IANPRV.TEX[B2,JMC] blob
sn#763041 filedate 1984-07-24 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00004 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 % use this file for experimental typsetting using th `boo' macros
C00003 00003 In this section we provide a succinct introduction to first order
C00038 00004 \end{document}
C00039 ENDMK
C⊗;
% use this file for experimental typsetting using th `boo' macros
% \pagelayout{boo}
\input{boo.plo[b2,jmc]}
% \documentstyle{boo,boo11}
\input{boo.sty[b2,jmc]}
\input{boo11.sty[b2,jmc]}
\input boomac.tex[b2,jmc]
\nofiles
\begin{document}
In this section we provide a succinct introduction to first order
logic, a tool that we shall use throughout this chapter to express properties of
programs and facts concerning our data domain . The reason for this formalism is
essentially twofold. Firstly it provides us with a convenient framework to work
within, a framework that allows for the possibility of computer checked proofs
and automated theorem proving . The second reason is that it enables us to
accurately specify the principles that we recognize as correct, and thus by
delineating them we specify what is to count as a proof and what is not.
\hbox to\hsize{\hss{\bf The Subject Matter}\hss}
Our aim is to describe the first order logic of a particular
fixed structure $\Re$. The one we have in mind is that of pure \lisp, which consists
of
1. Domains: SEXPS, LISTS, INTEGERS, SYMBOLS $\ldots$
2. Constants: |NIL|, |T|, 0, 1, 2, 3, $\ldots$
3. Operations: \qa, \qd, \qcons, \qeq, \qat, +, -, $<, >, \ldots$
\noindent However in this section we shall work with an arbitrary structure $\Re$ that consists
of
1. Domains: $A_1$, $A_2 \ldots A_n$
2. Constants: $c_1, c_2, \ldots c_m$
3. Operations: $f_1, f_2, \ldots f_s$
4. Predicates: $A_1P, A_2P \ldots A_nP$
\noindent Some remarks are in order. Firstly each $A_i$ is assumed to be a non-empty set of
objects. Secondly each operation $f_i$ is assumed to have an arity, namely the number of
arguments that it expects. In general it is also the case that the arguments must
of a particular nature. For example the operation $f_i$ may be such that
$$f_i : A_{i_1} \times A_{i_2} \times \ldots \times A_{i_k} \longrightarrow A_{i_{k+1}}$$
\noindent which describes what sort of arguments $f_i$ requires and what sort of
values it returns. Let us leave aside the question of what happens when we feed
an operation of this sort
the wrong sorts of arguments, we shall return to that later. In the meantime
the worried reader is advised to think of the operations as total, as indeed they will
,in the case of pure \lisp, turn out to be. The predicate $A_iP$ is true of any
element $x$ of the domains if and only if $x \in A_i$.
Finally each constant is assumed to belong to at least one of the sorts.
Our version of the first order logic that we use here is somewhat non-standard as
is our notion of
a multi-sorted structure, our logic is different in that we allow lambda terms and our
notion of structure does not require that the domains be disjoint.
These differences are inessential and we shall not draw
attention to them again.
\hbox to\hsize{\hss{\bf The Language}\hss}
Corresponding to our structure $\Re$ is its first order language, out of which
we shall construct more complex syntactic
expressions. These can be divided into two types,
logical and non-logical. The difference being that the later depends upon the nature
of $\Re$ while the former does not. In our case they are
\noindent{\bf Logical: }
$$ \forall, \exists, \neg,\land, \lor, \to, ), (, \lambda$$.
\noindent{\bf Non-logical: }
1. For each sort $A_i$ a countable set of variables $x_i, y_i, z_i,\ldots$
2. A countable collection of general variables $x, y, z, \ldots $
3. For each constant $c_i$ a constant symbol $\hat c_i$.
4. For each operation $f_i$ an operation symbol $\hat f_i$.
5. For each predicate $A_iP$ a predicate symbol $\hat {A_iP}$.
In usual treatments of first order logic one takes great pains to distinguish
between a constant or operation symbol and the corresponding constant or operation.
One of the advantages of working with a fixed domain is that the importance of
this distinction diminishes. Consequently from here on we shall not distinguish
between them, leaving context to do the work of $\hat{}$.
Using this language we define the terms, function expressions,
predicate expressions and formulas
of the structure $\Re$. The terms and function expressions are defined by mutual
recursion as follows.
\noindent{\bf Terms:}
1. Constant symbols and variables are terms.
2. If $t_1,t_2$ and $t_3$ are terms then $[\qif t_1 \qthen t_2 \qelse t_3]$
is a term.
3. If $\Psi$ is an n-ary function expression and $t_1, t_2 \ldots t_n$ are
terms then $\Psi [t_1,t_2, \dots t_n]$ is a term.
\noindent{\bf Function expressions:}
1. An m-ary operation symbol is an m-ary function expression.
2. If $\chi_1, \chi_2, \ldots \chi_m$ are general variables and $t$ is a term
then $[\lambda \chi_1, \ldots \chi_m .t]$ is an m-ary function expession.
The idea here is that variables of each sort are interpreted as
ranging over their particular domain ($x_i$ over $A_i$) and the general
variables ranging over all the domains. Thus terms are meant to be
interpreted as expressions that name elements of $\Re$ and function
expressions name operations on the domains. For example, in the case of
pure \lisp, some examples of terms are
(i) |NIL|
(ii) $x$
(iii) $\qa x$
(iv) $\qif \qat x \qthen x \qelse \qd x$
\noindent while examples of function expressions are
(v) \qa
(vi) $[\lambda \chi . \qif \qat \chi \qthen \chi \qelse \qd \chi]$
The predicate expressions and formulas in the first order language of $\Re$
are also defined by mutual recursion.
\noindent {\bf Predicate expressions:}
1. The predicate symbols $A_iP$ are unary predicate expressions.
2. If $\chi_1 \ldots \chi_n$ are general variables and $\phi$ is a formula then
$[\lambda \chi_1 \ldots \chi_n . \phi]$ is an n-ary predicate expression.
\noindent{\bf Formulas:}
1. If $t_1$ and $t_2$ are terms then $t_1 = t_2$ is a formula.
2. If $\Phi$ is an n-ary predicate expression and $t_1 \ldots t_n$ are terms
then $\Phi [ t_1 \ldots t_n]$ is a formula.
3. If $\phi_1$ and $\phi_2$ are formulas then so are $[\neg \phi_1], [\phi_1 \land
\phi_2], [\phi_1 \lor \phi_2]$ and $[\phi_1 \to \phi_2]$.
4. If $\phi$ is a formula and $\chi$ is a variable then
$\forall \chi \phi$ and
$\exists \chi \phi$
are formulas.
Some examples of formulas, in the case of pure \lisp, are
(viii) $\qa x = \qd x$
(ix) $\qat x = |T|$
(x) $[[\neg \qat x] = |T|] \to \qa x = \qd x$
(xii) $\forall x [ [\neg \qat x = |T|] \to [\exists y \exists z [ x = y \qcons z]]]$
We say that a variable, $x$,is bound if it is in the lexical (textual) scope of a quantifier
of the form $\exists x$ or $\forall x$ or else occurs in the body $t$ of a lambda
expression of the form $\lambda y_1,\ldots x, \ldots y_k . t$ . To be perfectly
precise we should talk about bound occurrences of variables, we presume the reader is
familiar with these concepts. An occurrence of a variable that is not bound is said
to be free .A sentence is a formula which contains no free occurences of variables.
\hbox to\hsize{\hss{\bf The Semantics}\hss}
The semantics or meaning of these terms, function and predicate expressions
and formulas
can now be explained. The first thing to observe is that although variables are
terms and terms are supposed to denote elements of our structure, there is very
little meaning one can give to a single solitary variable. It is for this reason
that most of the expressions listed above only have a meaning with respect to a
particular variable assignment. In our case we have sorted variables so well shall
spell out in detail what, in our situation, a variable assignment is.
A {\bf variable assignment} $\nu $, is a function that assigns to each variable
(whether sorted or general) an element of one of the domains of $\Re$ .
Furthermore it must do so in such a
fashion that if a variable is of a particular sort then its value under the
assignment must belong to the corresponding domain. In otherwords
$\nu(x_i) \in A_i$ for each variable $x_i$ of sort $A_i$. There are no restrictions
on the particular domain that a general variables value must be in.
Given a particular assignment $\nu$ we can give meanings to the terms, function
expressions and formulas that we have described. We do this by induction on the
way they are constructed. We begin with terms.
\noindent{\bf Terms:}
The meaning of a constant symbol is the constant that it denotes, while the
meaning of a variable is the value that $\nu$ assigns to it.
If $\phi$ is a function expression taking $n$ arguments, and $t_1 \ldots t_n$
are terms, then the meaning of $\phi[t_1 \ldots t_n]$ is the value
of the function assigned to $\phi$ when applied to the values assigned to
the terms $t_1 \ldots t_n$. If $t_1, t_2$ and $t_3$ are terms,
if $t_1$ is true then the value assigned to
$\qif t_1 \qthen t_2 \qelse t_3$
is
the value assigned to $t_2$, otherwise if $t_1$ is assigned the value |NIL|
then the value of the whole term is $t_3$.
Given a term $t$ we write $t↑\nu$ to denote the object it refers to under the
assignment $\nu$.
For example in the case of pure \lisp , if the value assigned to $x$ is {\sx (A . B)}
then the meanings of the terms given in examples (i)-(iv) above are:
(i) |NIL|
(ii) {\sx (A . B)}
(iii) |A|
(iv) |B|
\noindent{\bf Function expressions:}
Operation symbols
are assigned the operations denoted by the symbol.
The expression $[\lambda \chi_1, \ldots \chi_n .t]$ is assigned the function
that is defined as follows. The value
of $[\lambda \chi_1 \ldots \chi_n.t ][t_1 \ldots t_n ]$ is the value assigned to
$t$ relative to an interpretation which is modified by assigning the
values of the $t_i$s to the corresponding $\chi_i$s.
Given a function expression $\Psi$ we write $\Psi↑\nu$ to denote the function
it refers to under the particular assignment $\nu$.
The reason for only allowing $\lambda$-binding
of general variables is that if we allowed binding of sorted variables
then some applications would not result in well defined variable
assignments. In that a variable of a particular sort may be bound
to a value that did not lie in the appropriate domain.
Another approach would be
to allow $\lambda$-binding of any variable, but only allow application of
such expressions to terms of the appropriate sort. This makes
the class of well-formed expressions undecidable which is undesirable
if we wish to represent our system is in a computer.
\noindent {\bf Predicate expressions:}
We have already specified the meanings of the predicates $A_iP$, they simply
tell us whether or not an element belongs to the corresponding domain $A_i$.
The general situation is very similar to function expressions.
The predicate $[\lambda \chi_1 \ldots \chi_n . \phi]$ is true of the terms
$t_1 \ldots t_n$ under the assignment $\nu$ iff the formula $\phi$ is true
at the assignment that differs from $\nu$ only in that it assigns the
value $t_i↑\nu$ to the variable $\chi_i$, for $i = 1 \ldots n$. If $\Phi$
is a predicate expression then we write $\Phi↑\nu$ for the resulting
predicate under the assinment $\nu$.
\noindent{\bf Formulas:}
The meaning of a formula is the truth value assigned to it under the
variable assignment $\nu$. When a formula $\phi$ is true (in $\Re$) under the
assignment $\nu$ we write $\Re \models \phi[\nu]$ . Thus by defining the
relation $\models$ (called the satisfaction relation) we define the meanings of
the formulas. The definition is by induction on the construction of the formula
$\phi$.
1. If $\phi = [t_1 = t_2]$ then $\Re \models \phi [\nu]$ iff $t↑\nu_1 = t↑\nu_2$
2. If $\phi = \Phi[t_1 \ldots t_n]$ then $\Re \models \phi [\nu]$ iff
$\Phi↑\nu[t_1↑\nu \ldots t_n↑\nu]$.
3. If $\phi = [\neg \psi]$ then
$\Re \models \phi [\nu]$ iff
$\Re \not\models \psi [\nu]$
4. If $\phi = [\theta_1 \land \theta_2]$ then
$\Re \models \phi [\nu]$ iff
$\Re \models \theta_1[\nu]$ and
$\Re \models \theta_2[\nu]$.
5. If $\phi = [\theta_1 \lor \theta_2]$ then
$\Re \models \phi [\nu]$ iff
$\Re \models \theta_1[\nu]$ or
$\Re \models \theta_2[\nu]$.
6. If $\phi = [\theta_1 \to \theta_2]$ then
$\Re \models \phi [\nu]$ iff
$\Re \not\models \theta_1[\nu]$ or
$\Re \models \theta_2[\nu]$.
7. If $\phi = \forall \chi \psi$ then
$\Re \models \phi [\nu]$ iff
for every element $x$ in the domain that the variable $\chi$ ranges over
$\Re \models \psi [\nu↑x_\chi]$.
Where $\nu↑x_\chi$ is the variable
assignment exactly like $\nu$ except that it assigns the value $x$ to
the variable $\chi$.
8. If $\phi = \exists \chi \psi$ then
$\Re \models \phi [\nu]$ iff
there is some element $x$ in the domain that the variable $\chi$ ranges over
such that $\Re \models \psi [\nu↑x_\chi]$.
Although this may look complicated, it is in fact quite straight
forward. An equality statement is true (w.r.t $\nu$) iff both sides of
the equality refer to the same object . The negation of a formula is true
iff the formula is false. A conjunction is true iff both conjuncts are
true and a disjunction is true iff one of the disjuncts is true. Finally
If $\chi$ is a variable, and $\phi$ is a formula, then
$\forall \chi \phi$ is true exactly if $\phi$ is true for all allowed
interpretations of the variables differing from the fixed one only in
assignment made to the variable $\chi$ .
$\exists \chi \phi$ is true exactly if there is some allowed
interpretation of the variable differing from the fixed one only in
assignment made to the variable $\chi$ .
The reader should try as an exercise to see why the truth of a
sentence does not depend on the particular assignment. Thus it makes sense to
say that a sentence is true in a structure without ever mentioning variable
assignments.
Those who are familiar with the lambda calculus should note
that $\lambda$ is being used here in a very limited way. Namely, the variables
in a lambda-expression take only elements of the domain as values,
whereas the essence of the lambda calculus is that they take arbitrary
functions as values. We may call these restricted lambda expressions
{\it first-order lambdas}.
\hbox to\hsize{\hss{\bf The Deduction System}\hss}
As we said at the begining of this section one of the main reasons
for carrying out this formalism is that we can characterize what we regard
as a proof. The idea being that we shall start with certain given facts
about our structure $\Re$. These being sentences in the first order
logic of $\Re$. From these using certain rules of deduction we shall prove
other facts concerning $\Re$. In our case the sentences that we will be most
interested in will express properties of operations or programs. We shall leave
the discussion of non-logical axioms and the main proof principle until a later
section. What concerns us here is the very basic principles of deduction in
first order logic.
We shall divide our discussion of the axioms into four parts, namely
boolean axioms, quantifier axioms, equality axioms and lambda axioms .These four
sets of axioms together with two rules of inference constitute the logical
deduction system for our version of first order logic.
{\bf Boolean axioms:} We are quite generous in what we shall count as a
boolean or propositional axiom. But first we need to explain what a
tautology is.
A propositional formula is a formula
that is built up from propositional constants $p, q, r, \ldots$
using the boolean connectives $\neg, \lor, \land$ and $\to$. A
propositional formula is said to be a {\it tautology}
if every assignment of truth values, \qtrue\ and \qfalse, to the propositional
atoms in it make the whole formula true. The simplest way to determine
whether a propositional formula is a tautology is to use truth tables.
The meanings of the connectives being defined by the following table.
?????table????????
Now every formula $\phi$ in the language of $\Re$ which can be obtained
from a propositional formula $\psi$ by uniformly and simultaneously
substituting formulas of $\Re$ for the propositional atoms in $\psi$
is a logical axiom of our system. This is a rather large collection
of axioms, nevertheless there is a mechanical procedure to decide
whether a formula of $\Re$ belongs to this group of axioms. The main purpose
of these axioms is to free us from the rather trivial task of manipulating
the boolean operations of our logic.
{\bf Quantifier axioms:} The main reason that we introduced sorted variables
was to simplify the formulas that we will have to work with later. For
example to express the fact that every element of domain $A_i$ has property
$P$ we can write $\forall x_i P(x_i)$ rather than $\forall x[ A_iP[x] \to P(x)]$
. However they make our axioms for introducing and eliminating quantifiers somewhat
more complex. Consequently we will first explain the axioms for quantifiers using
only the general variables, then explain the necessary modifications in the sorted
case. Also notice that $\exists x \phi$ has exactly the same meaning as the
formula $\neg \forall x [ \neg \phi]$ which is why we will only state axioms
for the quantifier $\forall$ , treating the quantifier $\exists$ purely as an
abbreviation for the corresponding form involving $\forall$.
1. If $\phi$ and $\psi$ are formulas of $\Re$ and $x$ is a general variable
not free in $\phi$, then the formula
$$\forall x [ \phi \to \psi ] \to [ \phi \to \forall x \psi]$$
\noindent is an axiom.
2. If $\phi$ and $\psi$ are formulas and $\psi$ is obtained from $\phi$ by
substituing each free occurrence of the general variable $x$ in $\phi$ by the
term $t$, then as long as no variable free in $t$ becomes bound by this
procedure
$$[\forall x \phi] \to \psi$$
\noindent is an axiom.
The first set of axioms corresponds to universal introduction. The idea being
that if we derive $\psi$ from $\phi$ and $\phi$ contains no special
assumptions about the nature of $x$ then we can actually conclude $\forall x \psi$
from $\phi$ (Here we are also using one of our rules of inference that we will
describe later). This first sset of axioms does not need to be modified to
handle sorted variables, simply replace the word {\it general} by the word
{\it sorted} and the axioms remain valid.
The second set is rather more
troublesome. These axioms correspond to quantifier elimination
in that they allow us to conclude from $\forall x P(x)$ that $P(t)$ is true,
for any term $t$ (as long as the free variables that occur in $t$ do not
conflict with the bound variables in $P$). The problem that arises with sorted
variables is that we should from $\forall x_i P(x_i)$ only be able to conclude
that $P(t)$ is true for those terms $t$ whose value lay in the particular
domain in question, in this case $A_i$.
For this reason we add the following axioms to the second set of axioms
3. If $\phi$ and $\psi$ are formulas and $\psi$ is obtained from $\phi$ by
substituing each free occurrence of the sorted variable $x_i$ in $\phi$ by the
term $t$, then as long as no variable free in $t$ becomes bound by this
procedure
$$[[\forall x_i \phi] \land A_iP[t]] \to \psi$$
\noindent is an axiom.
We remark that the sentences $\forall x_i A_iP[x_i]$ are also axioms, since they
simply express the conventions concerning our sorted variables.
{\bf Equality axioms:} These are very simple axioms that express the basic
properties of equality. Suppose $x$ and $y$ are any variables and that
$t]$ is a term and $\phi$ is a formula, then the following are axioms.
1. $x = x$
2. $[x = y] \to [ t_v↑x = t_v↑y]$
3. $[x = y] \to [ \phi_v↑x \to \phi_v↑y]$
\noindent where $t_v↑x$ and $\phi_v↑x$ denote the result of uniformly
substituting the variable $x$ for $v$.
{\bf Lambda axioms:}
The only rule required for handling lambda-expressions
in first order logic is called {\it lambda-conversion}. It is essentially
one of
$[\lambda \chi .T ][t] =$ the result of substituting $t$ for $\chi$ in $T$.
\noindent or
$[\lambda \chi. \Phi][t] \Longleftrightarrow$ the result of substituting $t$ for $\chi$ in $\Phi$.
\noindent dependig on whether the lambda term is a predicate or function.
As examples of this rule, we have
$$[\lambda x .\qa x \qcons y][u \qcons v] = [\qa [u \qcons v]] \qcons y $$.
However, a complication requires modifying the rule. Namely,
we can't substitute for a variable $x$ an expression $e$ that has a free variable
$x_1$ into a context in which $x_1$ is bound. Thus it would be
wrong to substitute $x_1+x_2$ for $x$ in $\forall x_1 [x + x_1 = x_2]$ or into
the term $[\lambda x_1 x + x_1][u + v]$. Before doing the substitution, the
variable $x_1$ would have to be replaced in all its bound occurrences by
a new variable.
Lambda-expressions can always be eliminated from sentences and
terms by lambda-conversion, but the expression may increase greatly
in length if a lengthy term replaces a variable that occurs more than
once in $t$. It is easy to make an expression of length proportional to $n$ whose
length is proportional to $2↑n$
after conversion of its $n$ nested lambda-expressions.
For example
$$\lambda x_1.[x_1\qcons x_1][ ... [\lambda x_n .[x_n \qcons x_n][A]] ... ]$$
becomes
{\sx (A . A)},
{\sx ((A . A) . (A . A))},
or
{\sx ((((A . A) . (A . A)) . ((A . A) . (A . A))) . ((A . A) . (A . A)) . ((A . A) . (A . A)))) }
for $n = 1, 2,$ or $4$ respectively.
The use of $\lambda$-expressions in writing programs also may improve
efficiency by specifying some the expression be evaluated once and
the value used in several places. Where there are side-effects, of
course more than efficiency is affected.
This completes our introduction to first order logic.
\end{document}